59 research outputs found

    Evaluating directive-based programming models on Wave Propagation Kernels

    Get PDF
    HPC systems have become mandatory to tackle the ever-increasing challenges imposed by new exploration areas around the world. The requirement for more HPC resources depends on the complexity of the area under exploration, yet the larger the HPC system, the more the energy consumption involved. Reduction of overall power consumption in HPC facilities, lead technologies vendors to introduce many-core devices and heterogeneous computing to the supercomputers, thus, forcing exploration codes to be ported to such new architectures. As the Oil & Gas industry has more than 30 years of legacy code, the effort to adapt it could be huge. To this extent, several programming models emerged, e.g. high-level directive-based programming models, such as OpenMP, OpenACC, and OmpSs rely on specifying to the compiler the parallelism directives to release users from manually decomposing and processing the parallel regions. The results show that it is possible to obtain a parallel code for current heterogeneous HPC architectures investing a few hours or days. The obtained speedup is at least an order of magnitude w.r.t. a sequential code. However, we provide parallelism inside a single computational node, and a wider study for evaluating the costs of porting and parallelizing across computational nodes is pending.Authors thank Repsol for the permission to publish the present research, carried out at the Repsol-BSC Research Center. This work has received funding from the European Union’s Horizon 2020 Programme (2014-2020) and from the Brazilian Ministry of Science, Technology and Innovation through Rede Nacional de Pesquisa (RNP) under the HPC4E Project (www.hpc4e.eu), grant agreement n.◦ 689772.Peer ReviewedPostprint (author's final draft

    Job scheduling considering best-effort and soft real-time applications on non-dedicated clusters

    Get PDF
    As Network Of Workstations (NOWs) emerge as a viable platform for a wide range of workloads, new scheduling approaches are needed to allocate the collection of resources from competing applications. New workload types introduce high uncertainty into the predictability of the system, hindering the applicability of the job scheduling strategies. A new kind of parallel applications has appeared in business or scientific domains, namely Soft Real-Time (SRT). They, together with new SRT desktop applications, turn prediction into a more difficult goal by adding inherent complexity to estimation procedures. In previous work, we introduced an estimation engine into our job scheduling system, termed CISNE. In this work, the estimation engine is extended, by adding two new kernels, both SRT aware. Experimental results confirm the better performance of simulated respect to the analytical kernels and show a maximum average prediction error deviation of 20%.Mientras las Redes de Estaciones de Trabajo (NOWs) emergen como una plataforma viable para un amplio espectro de aplicaciones, son necesarios nuevos enfoques para planificar los recursos disponibles entre las aplicaciones que compiten por ellos. Los nuevos tipos de cargas introducen una alta incertidumbre en la predictibilidad del sistema, afectando la aplicabilidad de las estrategias de planificación de tareas. Un nuevo tipo de aplicaciones paralelas, denominado tiempo real débil (SRT), ha aparecido tanto en los ámbitos comerciales como científicos. Las nuevas aplicaciones paralelas SRT, conjuntamente con los nuevos tipos de aplicaciones SRT de escritorio, convierten la predicción en una meta aún más difícil, al agregar complejidad a los procedimientos de estimación. En trabajos anteriores dotamos al sistema CISNE de un motor de estimación. En este trabajo añadimos al sistema de predicción fuera de línea dos nuevos núcleos de estimación con capacidad SRT. Los resultados experimentales muestran un mejor rendimiento del núcleo simulado con respecto a su homólogo analítico, mostrando un promedio de desviación máximo del 20%.VIII Workshop de Procesamiento Distribuido y ParaleloRed de Universidades con Carreras en Informática (RedUNCI

    Optimizing Fully Anisotropic Elastic Propagation on 2nd Generation Intel Xeon Phi Processors

    Get PDF
    This work shows several optimization strategies evaluated and applied to an elastic wave propagation engine, based on a Fully Staggered Grid, running on the latest Intel Xeon Phi processors, the second generation of the product (code-named Knights Landing). Our fully optimized code shows a speed-up of about 4x when compared with the same algorithm optimized for the previous generation processor.Authors also thank Repsol for the permission to publish the present research, carried out at the Repsol-BSC Research Center. This work has received funding from the European Union's Horizon 2020 Programme (2014-2020) and from the Brazilian Ministry of Science, Technology and Innovation through Rede Nacional de Pesquisa (RNP) under the HPC4E Project (www.hpc4e.eu), grant agreement n.â—¦ 689772. * Other brands and names are the property of their respective owners.Peer ReviewedPostprint (author's final draft

    Toward an automatic full-wave inversion: Synthetic study cases

    Get PDF
    Full-waveform inversion (FWI) in seismic scenarios continues to be a complex procedure for subsurface imaging that might require extensive human interaction in terms of model setup, constraints, and data preconditioning. The underlying reason is the strong nonlinearity of the problem that forces the addition of a priori knowledge (or bias) in order to obtain geologically sound results. In particular, when the use of a long-offset receiver is not possible or may not favor the reconstruction of the fine structure of the model, one needs to rely on reflection data. As a consequence, the inversion process is more prone to becoming stuck in local minima. Nevertheless, misfit functionals can be devised that can either cope with missing long-wavenumber features of initial models (e.g., cross-correlation-based misfit) or invert reflection-dominated data whenever the models are sufficiently good (e.g., normalized offset-limited least-squares misfit). By combining both, high-frequency data content with poor initial models can be successfully inverted. If one can figure out simple parameterizations for such functionals, the amount of uncertainty and manual work related to tuning FWI would be substantially reduced. Thus, FWI might become a semiautomatized imaging tool.We want to thank Repsol for funding this research by means of the Aurora project. This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 644202. Additionally, the research leading to these results has received funding from the European Union’s Horizon 2020 Programme (2014-2020) and from Brazilian Ministry of Science, Technology and Innovation through Rede Nacional de Pesquisa (RNP) under the HPC4E Project (www.hpc4e.eu), grant agreement No 689772. We acknowledge Chevron for the dataset that was used in our second example.Peer ReviewedPostprint (author's final draft

    Radiation-Induced Error Criticality in Modern HPC Parallel Accelerators

    Get PDF
    In this paper, we evaluate the error criticality of radiation-induced errors on modern High-Performance Computing (HPC) accelerators (Intel Xeon Phi and NVIDIA K40) through a dedicated set of metrics. We show that, as long as imprecise computing is concerned, the simple mismatch detection is not sufficient to evaluate and compare the radiation sensitivity of HPC devices and algorithms. Our analysis quantifies and qualifies radiation effects on applications’ output correlating the number of corrupted elements with their spatial locality. Also, we provide the mean relative error (dataset-wise) to evaluate radiation-induced error magnitude. We apply the selected metrics to experimental results obtained in various radiation test campaigns for a total of more than 400 hours of beam time per device. The amount of data we gathered allows us to evaluate the error criticality of a representative set of algorithms from HPC suites. Additionally, based on the characteristics of the tested algorithms, we draw generic reliability conclusions for broader classes of codes. We show that arithmetic operations are less critical for the K40, while Xeon Phi is more reliable when executing particles interactions solved through Finite Difference Methods. Finally, iterative stencil operations seem the most reliable on both architectures.This work was supported by the STIC-AmSud/CAPES scientific cooperation program under the EnergySFE research project grant 99999.007556/2015-02, EU H2020 Programme, and MCTI/RNP-Brazil under the HPC4E Project, grant agreement n° 689772. Tested K40 boards were donated thanks to Steve Keckler, Timothy Tsai, and Siva Hari from NVIDIA.Postprint (author's final draft

    Job scheduling considering best-effort and soft real-time applications on non-dedicated clusters

    Get PDF
    As Network Of Workstations (NOWs) emerge as a viable platform for a wide range of workloads, new scheduling approaches are needed to allocate the collection of resources from competing applications. New workload types introduce high uncertainty into the predictability of the system, hindering the applicability of the job scheduling strategies. A new kind of parallel applications has appeared in business or scientific domains, namely Soft Real-Time (SRT). They, together with new SRT desktop applications, turn prediction into a more difficult goal by adding inherent complexity to estimation procedures. In previous work, we introduced an estimation engine into our job scheduling system, termed CISNE. In this work, the estimation engine is extended, by adding two new kernels, both SRT aware. Experimental results confirm the better performance of simulated respect to the analytical kernels and show a maximum average prediction error deviation of 20%.Mientras las Redes de Estaciones de Trabajo (NOWs) emergen como una plataforma viable para un amplio espectro de aplicaciones, son necesarios nuevos enfoques para planificar los recursos disponibles entre las aplicaciones que compiten por ellos. Los nuevos tipos de cargas introducen una alta incertidumbre en la predictibilidad del sistema, afectando la aplicabilidad de las estrategias de planificación de tareas. Un nuevo tipo de aplicaciones paralelas, denominado tiempo real débil (SRT), ha aparecido tanto en los ámbitos comerciales como científicos. Las nuevas aplicaciones paralelas SRT, conjuntamente con los nuevos tipos de aplicaciones SRT de escritorio, convierten la predicción en una meta aún más difícil, al agregar complejidad a los procedimientos de estimación. En trabajos anteriores dotamos al sistema CISNE de un motor de estimación. En este trabajo añadimos al sistema de predicción fuera de línea dos nuevos núcleos de estimación con capacidad SRT. Los resultados experimentales muestran un mejor rendimiento del núcleo simulado con respecto a su homólogo analítico, mostrando un promedio de desviación máximo del 20%.VIII Workshop de Procesamiento Distribuido y ParaleloRed de Universidades con Carreras en Informática (RedUNCI

    Applying backfilling over a non-dedicated cluster

    Get PDF
    The resource utilization level in open laboratories of several universities has been shown to be very low. Our aim is to take advantage of those idle resources for parallel computation without disturbing the local load. In order to provide a system that lets us execute parallel applications in such a non-dedicated cluster, we use an integral scheduling system that considers both Space and Time sharing concerns. For dealing with the Time Sharing (TS) aspect, we use a technique based on the communication-driven coscheduling principle. This kind of TS system has some implications on the Space Sharing (SS) system, that force us to modify the way job scheduling is traditionally done. In this paper, we analyze the relation between the TS and the SS systems in a non-dedicated cluster. As a consequence of this analysis, we propose a new technique, termed 3DBackfilling. This proposal implements the well known SS technique of backfilling, but applied to an environment with a MultiProgramming Level (MPL) of the parallel applications that is greater than one. Besides, 3DBackfilling considers the requirements of the local workload running on each node. Our proposal was evaluated in a PVM/MPI Linux cluster, and it was compared with several more traditional SS policies applied to non-dedicated environmentsVI Workshop de Procesamiento Distribuido y Paralelo (WPDP)Red de Universidades con Carreras en Informática (RedUNCI

    Acceleration strategies for elastic full waveform inversion workflows in 2D and 3D

    Get PDF
    Full waveform inversion (FWI) is one of the most challenging procedures to obtain quantitative information of the subsurface. For elastic inversions, when both compressional and shear velocities have to be inverted, the algorithmic issue becomes also a computational challenge due to the high cost related to modelling elastic rather than acoustic waves. This shortcoming has been moderately mitigated by using high-performance computing to accelerate 3D elastic FWI kernels. Nevertheless, there is room in the FWI workflows for obtaining large speedups at the cost of proper grid pre-processing and data decimation techniques. In the present work, we show how by making full use of frequency-adapted grids, composite shot lists and a novel dynamic offset control strategy, we can reduce by several orders of magnitude the compute time while improving the convergence of the method in the studied cases, regardless of the forward and adjoint compute kernels used.The authors thank REPSOL for the permission to publish the present research and for funding through the AURORA project. J. Kormann also thankfully acknowledges the computer resources, technical expertise and assistance provided by the Barcelona Supercomputing Center - Centro Nacional de Supercomputacti ´on together with the Spanish Supercomputing Network (RES) through grant FI-2014-2-0009. This project has received funding from the European Union’s Horizon 2020, research and innovation programme under the Marie Skłodowska-Curie grant agreement no. 644202. The research leading to these results has received funding from the European Union’s Horizon 2020 Programme (2014–2020) and from the Brazilian Ministry of Science, Technology and Innovation through Rede Nacional de Pesquisa (RNP) under the HPC4E Project (www.hpc4e.eu), grant agreement no. 689772.We further want to thank the Editor Clint N. Dawson for his help, and Andreas Fichtner and an anonymous reviewer for their comments and suggestions to improve the manuscript.Peer ReviewedPostprint (published version

    3D seismic imaging through reverse-time migration on homogeneous and heterogeneous multi-core processors

    Get PDF
    Abstract. Reverse-Time Migration (RTM) is a state-of-the-art technique in seismic acoustic imaging, because of the quality and integrity of the images it provides. Oil and gas companies trust RTM with crucial decisions on multi-million-dollar drilling investments. But RTM requires vastly more computational power than its predecessor techniques, and this has somewhat hindered its practical success. On the other hand, despite multi-core architectures promise to deliver unprecedented computational power, little attention has been devoted to mapping efficiently RTM to multi-cores. In this paper, we present a mapping of the RTM computational kernel to the IBM Cell/B.E. processor that reaches close-tooptimal performance. The kernel proves to be memory-bound and it achieves a 98% utilization of the peak memory bandwidth. Our Cell/B.E. implementation outperforms a traditional processor (PowerPC 970MP) in terms of performance (with an 15.0× speedup) and energy-efficiency (with a 10.0× increase in the GFlops/W delivered). Also, it is the fastest RTM implementation available to the best of our knowledge. These results increase the practical usability of RTM. Also, the RTM-Cell/B.E. combination proves to be a strong competitor in the seismic arena

    Incidencia de los modelos de programación paralela y escalado de frecuencia de CPUs en el consumo energético de los sistemas de HPC

    Get PDF
    El consumo energético se ha vuelto uno de los mayores desafíos en el campo de la computación de altas prestaciones (HPC). El costo energético producido por las supercomputadoras, durante el tiempo de vida de la instalaci ón, es similar al de adquisición. Así, además de su incidencia en el medio ambiente, la energía es un factor limitante para el HPC. Nuestra línea de investigación se orienta a intentar reducir el consumo energ ético de los sistemas de cómputo paralelo, a través de modi caciones en los algoritmos de las aplicaciones. En este artículo, analizamos el consumo energé- tico de las aplicaciones paralelas, buscando la posible in uencia (en el consumo) de los paradigmas de programación paralela de memoria compartida (OpenMP) y paso de mensajes (MPI), y sus variaciones a diferentes niveles de escalado de frecuencia de las CPUs. Los resultados muestran que el modelo de programación tiene una incidencia importante en el consumo energético de los sistemas de cómputo, y que, reducir la frecuencia de las CPUs no siempre lleva a una reducción en el consumo y hasta puede aumentarlo. Creemos que este estudio puede ser un punto de partida importante para futuros trabajos en el área.Presentado en el X Workshop Procesamiento Distribuido y Paralelo (WPDP)Red de Universidades con Carreras en Informática (RedUNCI
    • …
    corecore